The longitudinal course of childhood bullying victimization and associations with self‐injurious thoughts and behaviors in children and young people: A systematic review of the literature

Abstract Introduction Bullying victimization has consistently been highlighted as a risk factor for self‐injurious thoughts and behaviors (SITBs) in young people. This systematic review of prospective, community‐based studies explored associations between bullying victimization (traditional/face‐to‐face and cyber) across the full spectrum of self‐harm and suicidality, in children and young people aged up to (and including) 25 years. Importantly, associations by sex/gender were explored. Methods MEDLINE, Embase, PsycINFO, CINAHL and Scopus were searched for articles meeting the inclusion criteria. Articles were screened by title, abstract and full text. Quality appraisal was performed using the Newcastle‐Ottawa Scale for cohort studies. Data were synthesized narratively. The protocol is registered on PROSPERO (CRD42021261916) and followed PRISMA 2020 guidelines. Results A total of 35 papers were included, across 17 countries. Results were presented by bullying type: traditional/face‐to‐face (n = 25), cyber (n = 7) and/or an aggregate of both types (n = 7). Outcomes included suicidal ideation (n = 17), self‐harm (n = 10), suicide attempt (n = 4), NSSI (n = 4), other (n = 7). Studies measured outcomes in under 18s (n = 24), 18–25‐year‐olds (n = 8) and both under 18s and 18–25‐year‐olds (n = 3). Studies exploring the role of sex/gender (20%) found some interesting nuances. Conclusions Some weak to strong associations between bullying and SITBs were found yet conclusions are tentative due to study heterogeneity (e.g., methods used, conceptualizations and operationalisations of exposures/outcomes). Future research should address methodological issues raised in this review, and further explore gender differences in bullying, including by bullying sub‐types (e.g., overt or relational) and victim status (e.g., victim or bully‐victim).

behaviors (Holt et al., 2015), particularly where studies have adjusted for associated covariates. Second, by looking at community-based studies, as many cases remain undetected in the community, particularly self-harm (Geulayov et al., 2018;Hawton et al., 2012). Third, by focusing on studies that follow young people up to and including the age of 25, which meets the definition 1 of the World Health Organization (2021), and given that development is thought to continue until this age (Sawyer et al., 2018). This is also in line with previous reviews (Abdelraheem et al., 2019;John et al., 2018;Williams et al., 2021). Fourth, by considering the broad spectrum of SITBs, and the specific construct of bullying victimization, rather than peer victimization more broadly. Finally, by considering whether the associations differ by sex/gender. As such, the overall aim is to summarize the longitudinal course of childhood bullying victimization and associations with self-harm, suicidal ideation and suicidal behaviors in children and young people, across nonclinical settings. Objectives: 1. Is bullying victimization in childhood and adolescence associated with future self-harm, suicidal thoughts and suicidal behaviors in children and young people up to, and including, the age of 25? 2. Do these associations differ for type of bullying victimization (i.e., traditional bullying and/or cyberbullying victimization)? 3. Do these associations differ by sex/gender? 2 | METHODS

| Protocol and registration
This review follows the recommendations of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines (see Materials S1 for PRISMA checklist). The protocol was pre-registered on PROSPERO (CRD42021261916).

| Inclusion and exclusion criteria
Studies were required to meet the following inclusion criteria: (1) original, empirical research published in a peer reviewed journal; (2) examines the relationship between exposure to bullying victimization as a child or adolescent under 18 years old, and the outcome of self-harm or suicidal ideation or behavior as a child or young adult under 26 years old; (3) uses a longitudinal, prospective design with a minimum of two time points; (4) community-based studies; (5) written in English; (6) has a comparator (i.e., a group of bullied vs. nonbullied children). The main outcome was any form of self-harm (NSSI, self-poisoning, self-injury) or suicidal thoughts or behaviors (ideation, attempts). This study did not examine a particular type of bullying victimization; therefore, all direct and indirect forms of bullying, including cyberbullying, were included. Similar to the approach of Holt et al. (2015), studies described by the authors as measuring peer victimization and aggression more generally were not included. Studies were excluded if they only looked at bullying perpetration, and clinical samples were excluded to focus on understanding self-harm and suicidality in the community. Cross-sectional studies, case series, case reports, qualitative studies, opinion pieces, editorials, reviews, meta-analyses and intervention studies were not included.

| Search strategy
An electronic search of the following databases was run on July 6, 2021, limited to studies in the English language: MEDLINE (OVID), EMBASE (Ovid), PsycINFO (Ovid), CINAHL and Scopus (Elsevier). A search string was developed to include relevant keywords when searching title and abstracts with subject heading searching where possible (see Materials S2).
Additionally, a manual search of reference lists from relevant published systematic reviews was conducted. Web of Science was used to undertake forward and backwards citation searching of reference lists from included studies. Based on good practice guidance issued by PROSPERO, searches were re-run just before the final analyses and any further studies identified and retrieved for inclusion.
Citations were imported into EndNote and duplicates removed, before being uploaded onto Raayan (Ouzzani et al., 2016). Two researchers (EW and HC) independently reviewed 10% percent (n = 62) of title and abstracts, and agreement checked, before both screened the remaining 90% (n = 556) based on the inclusion criteria. The same process was repeated when reviewing the 78 papers at the full text stage. A third researcher CGA made the final decision if consensus was not met. The search was re-run on April 15, 2022 and due to high agreement in the first screening (>90%), EW independently conducted the updated search and consulted with HC for any papers causing uncertainty.

| Data extraction
Data from eligible studies was extracted into a predesigned form in Microsoft Excel based on predetermined criteria: key study details (author, year, country), setting (e.g., urban/rural), study design and duration of follow up period (months or years, waves), sample characteristics (baseline sample size, final sample size, sex/gender, age, attrition rate), details about exposure (traditional, cyber or aggregate, or sub-types such as physical or relational bullying; measurement/scale used), details about outcome (NSSI, self-harm, suicidal ideation, suicide attempt or other; measurement/scale used), variables adjusted for/covariates, statistical analyses used (e.g., odds ratio [OR], risk ratio [RR]) and relevant results, including results stratified by sex/gender. Authors were contacted if key information could not be ascertained from the paper, its Supporting Information Materials or a previous paper referenced in the article that lists more details of the sample characteristics and study procedure.

| Quality assessment
Each study was subject to quality assessment using the Newcastle-Ottawa Quality Assessment Scale for cohort studies (NOS; Wells et al., 2014), with grading in the following categories: (1) selection of cohorts, including representativeness and ascertainment of bullying status; (2) comparability of cohorts, and the use of appropriate methods to control for confounding; (3) assessment of outcome, including adequacy of follow up (see Materials S3 for scoring sheet). Specifically, a follow up of 6 months and response rate of 80%, with an adequate description of participants lost to follow up, was deemed appropriate based on previous relevant reviews using NOS for longitudinal studies (Moore et al., 2017;Valencia-Agudo et al., 2018). The quality score for ascertainment of exposure and outcome was awarded when using secure records (e.g., medical records), structured interview or self-report questionnaires with validated measurements (Latham et al., 2021;Moore et al., 2017). Using a star grading system based on thresholds used in other reviews (Polihronis et al., 2022;Williams et al., 2021), studies received an overall quality score of low (0-3 stars), medium (4-6 stars) or high (7-9 stars).

| Data analysis
To aid comparability, studies were analysed and results reported based on exposure type: traditional bullying victimization only, cyberbullying victimization only, bullying victimization (all types). Papers could appear in more than one group if they separated results by type of bullying (e.g., presenting results for traditional bullying separately to cyberbullying). Results were categorized as measuring traditional bullying if: (1) this was explicitly stated (e.g., they measure face-to-face overt, physical or relational bullying but not cyberbullying); (2) if the data was collected before the 2000s (as cyberbullying is a modern concept); or (3) if the validated scale or items did not explicitly refer to electronic bullying (i.e., they were developed to measure traditional bullying; Smith et al., 2008). Results for cyberbullying included studies that gave results for this specific type of bullying and the association with the outcomes. Results were classified as "bullying victimization (all types)" if the study used an aggregate measure (i.e., grouping traditional and cyberbullying together), or if their methods section was too vague to ascertain how bullying was measured.
Additionally, supplementary analyses assessed the measures used to capture bullying; specifically, whether a definition was provided to participants and whether the measure captured the three components of bullying (i.e., power imbalance, repetition, intention to cause harm). Authors were contacted if this information could not be ascertained from the manuscript.
A meta-analysis was not performed due to heterogeneity between studies in the exposure and outcomes assessed, and the measures used.

| Study selection
A total of 1383 records were identified through searching five academic databases. An additional three articles were identified through searching the reference list of previous relevant systematic reviews, and through forwards/backwards citation searching of articles included for the current review by using Web of Science. After 768 duplicates were removed, the title and abstracts of 61 studies were screened, resulting in 81 studies for full text review. Articles were excluded if they included the wrong age group (n = 5), did not use a prospective methodology or community-based sample (n = 19), did not have a suitable comparator to explore the association between exposure and outcome (e.g., if all the cohort are victims of bullying, n = 1), measured the wrong exposure (e.g., sexual victimization or peer victimization rather than bullying victimization, n = 18) or the wrong outcome (n = 3). A total of 35 articles were included in the final review for qualitative synthesis. Figure 1 details the process in a PRISMA flow chart (Figure 3).

| Research design of studies
represented 17 countries: 5 articles were from the United Kingdom; 4 from Finland; 3 from Korea; 2 each from Australia, Belgium, Canada, China, Norway, United States, Vietnam; 1 each from New Zealand, Israel, Spain, Sweden, Switzerland, Taiwan, The Netherlands. Additionally, there was one study of 10 European countries (Brunstein Klomek et al., 2019), and one study that looked at two samples, one from the United States and one from the United Kingdom (Lereya et al., 2015). All studies were longitudinal and studies had a minimum of two waves of data collection, with overall study duration ranging from 4 months (Quintana-Orts et al., 2022) to 17 years (Copeland et al., 2013;Lereya et al., 2015). Most studies used univariable (e.g., logistic regression) and multivariable (e.g., multiple logistic regression) inferential statistics, presenting unadjusted and adjusted results (e.g., ORs, RRs). Variables commonly controlled or adjusted for include sex/gender, age, socioeconomic status, baseline mental health including the outcome of interest. Some studies used structural equation models, cross-lagged panel analysis and/or path analysis instead of, or in addition to, logistic/linear regression models (Brunstein Klomek et al., 2019;Cho & Glassner, 2020;Cho, 2019;Garisch & Wilson, 2015;Le et al., 2019;Lereya et al., 2013;Lung et al., 2020;Zhu et al., 2021).

| Sample characteristics
Twenty-two included studies used samples that were completely unique to their paper. Thirteen papers used samples that were similar to the samples of another paper due to being part of the same cohort study. However, they had slightly different sample sizes for the purpose of analysis based on the following reasons: inclusion of future waves (Lereya et al., 2013(Lereya et al., , 2015Sigurdson et (Paykel et al., 1974), T2 (past 3 months), self-report questionnaire (in class), T3 (past 12 months) Multilevel autoregressive crosslagged models, ORs with beta values Gender, age, whether the adolescent was living without his biological parents (yes, no), whether the adolescent is an immigrant (yes, no), and whether the adolescent's parents lost their job during the last 12 months (yes, no) were included as covariates to account for their effects (Wasserman et al., 2015).
In the models predicting suicide ideation and/or suicide attempts, depression included as a covariate.

| Exposure and outcome: Definition and types, assessment measurements, and methods
Across the 35 studies, 13 provided a definition of bullying to participants, 14 did not and in 6 studies it was unclear if a definition was provided. Of the 13 studies with a definition, eight contained all three components of bullying (i.e., power imbalance, intention to harm, repetition). Within the measure itself, 8 studies captured power imbalance 2 , 24 studies captured intention to harm 3 and 32 studies captured repetition, although 15 studies classified bullying at a lower threshold than Solberg and Olweus (2003) frequently-cited value (i.e., less than two or three times a month). An additional study used a lower threshold for cyberbullying but not traditional bullying (Perret et al., 2020). Eight studies included definitions and measures that captured all three elements of bullying (Blasco et al., 2019;Fisher et al., 2012;Garisch & Wilson, 2015;Heikkilä et al., 2013;Kiekens et al., 2019;Le et al., 2017Le et al., , 2019Mortier et al., 2017). These studies used the following measures: Bully Survey (Swearer & Cary, 2003), an own measure previously used with good test-retest reliability (Fisher et al., 2012), two questions from the World Health Organization Youth Health Study (King et al., 1996), Revised Olweus Bully/Victim Questionnaire (Olweus, 1996), Peer Relations Questionnaire (Rigby & Slee, 1995). See Materials S5 for an overview of definitions and measures of each study.
Of the 11 studies that looked at both forms of bullying (i.e., traditional, face-to-face bullying and cyberbullying), 7 studies reported an aggregated measure (i.e., both types of bullying combined into one measure; Borschmann et al., 2020;Garisch & Wilson, 2015;Kiekens et al., 2019;Le et al., 2017Le et al., , 2019Lung et al., 2020;O'Connor et al., 2009) and 4 reported a disaggregated measure (i.e., conducting and reporting separate analyses for each type; Bannink et al., 2014;Benatov et al., 2021;Hemphill et al., 2015;Perret et al., 2020). The conceptualization of bullying in two studies (Lung et al., 2020;O'Connor et al., 2009) which used their own measure was vague and reported in this review as an aggregate measure of bullying due to limited information provided in the manuscript as to whether the questions asked were inclusive of cyberbullying or not. Two studies reported on the risk for different sub-types of traditional bullying (e.g., relational, physical, verbal). Outcomes for bully perpetrators only is outside the remit of this systematic review. Nine studies used multi-informant methods that included a combination of child and/or parent and/or teacher reports of bullying (Copeland et al., 2013;Fisher et al., 2012;Klomek et al., 2008Klomek et al., , 2009Lereya et al., 2013Lereya et al., , 2015Silberg et al., 2016;Sourander et al., 2006;Winsper et al., 2012). There was considerable heterogeneity in definitions and measurement of outcomes that ranged across the spectrum of self-harm and suicidal thoughts and behaviors. Six studies reported more than one SITB outcome (e.g., the authors looked at self-harm and suicidal thoughts and/or behaviors; Benatov et al., 2021;Brunstein Klomek et al., 2019;Kim et al., 2009;Mortier et al., 2017;Sigurdson et al., 2018;Winsper et al., 2012).

| Quality assessment
The methodological quality of studies ranged from 3 to 8.5 (M = 6.3), out of a possible range of 0 to 9 (see Materials S6). One was categorized as low (0-3), 21 as medium (4-6) and 14 as high quality (7-9). Studies performed well in representativeness of their cohorts and the majority adjusted for confounders, aiming to minimize inaccurate conclusions from spurious associations. However, many studies failed to control for the outcome at the start of the study, limiting conclusions around causality as it could not be guaranteed that bullying preceded the outcome. There was considerable heterogeneity in choice of measurements to ascertain exposure and/or outcome, with many using unvalidated measures. Follow up times were good, with only two studies less than 6 months (Garisch & Wilson, 2015;Quintana-Orts et al., 2022). High attrition rates increase the risk of bias, and this varied across studies, from 3% (Kim et al., 2009) to 62% (Bannink et al., 2014). However, drop out was often accounted for through attrition analysis and/or adjustment (e.g., using weights) where appropriate. Four studies did not report rates of attrition (Cho, 2019;Geoffroy et al., 2021;Lereya et al., 2013;Winsper et al., 2012). Interview 7 Medical record 1 Abbreviation: SITB, self-injurious thoughts and behaviors. a Studies with multiple exposures/outcomes in one paper may be counted more than once (e.g., a study may have used one validated and one unvalidated scale to measure two different types of outcomes).

| Association between cyberbullying only and SITB
Seven studies measured associations between cyberbullying and the following outcomes: NSSI (n = 1), self-harm (n = 2), suicidal ideation (n = 3), suicide attempt (n = 1), aggregate of suicidal ideation/attempt (n = 1). Table 4 provides an overview of the main associations, the range of effect sizes and references to the included studies. Cyberbullying was measured at ages 12-13 in three studies (Bannink et al., 2014;Perret et al., 2020;Zhu et al., 2021), at age 15 in two studies (Benatov et al., 2022;Hemphill et al., 2015) and at 18 years in another (Mars et al., 2020). Outcomes were collected under 18 years of age except one (Mars et al., 2020) and one study reported results for both bully-victims and victims only (Hemphill et al., 2015). Across included studies that reported ORs (n = 5), the effect of cyberbullying on the various measures of SITBs ranged from aORs 0.87, 95% CI [0.36, 2.11] to 2. 42 [1.41, 4.15]. Two studies reported ß values, ranging from ß = .04 to .38. Associations were largest for young women, and with self-harm and suicidal ideation. Definitions of cyberbullying were provided in two studies (Benatov et al., 2022 andPerret et al., 2020) and none of the studies captured "power imbalance" in their measures. Materials S4 and S7 provide further study-specific information.

| Association between bullying (aggregate of traditional and cyberbullying) and SITB
Of the seven studies which aggregated all forms of bullying, five explicitly stated the use of a measure that aggregated items on traditional and cyberbullying (Borschmann et al., 2020;Garisch & Wilson, 2015;Kiekens et al., 2019;Le et al., 2017Le et al., , 2019, and two were assumed to be an aggregate as they used an own measure with an unreported or broad definition (Lung et al., 2020;O'Connor et al., 2009). Table 5 provides an overview of the main associations, the range of effect sizes and references to the included studies. Outcomes were NSSI (n = 2), self-harm (n = 3) and suicidal ideation (n = 2). Six studies measured outcomes in under 18s (Borschmann et al., 2020;Garisch & Wilson, 2015;Le et al., 2017;Le et al., 2019;Lung et al., 2020;O'Connor et al., 2009) and one in over 18s (Kiekens et al., 2019). Across the included studies that reported adjusted ORs (aOR) (n = 5), the effect of (aggregated) bullying on the various measures of SITBs ranged from aORs 1. Note: Effect sizes in bold indicate statistically significant p < .05. Abbreviations: aOR, adjusted odds ratio; CI, confidence interval; nr, not reported; OR, (unadjusted) odds ratio a Indicates range of effect sizes given when there are multiple analyses with different outcomes/stratified results analyses within the same paper.
(i.e., bully-victims; Le et al., 2017;Le et al., 2019), and for those frequently victimized (Borschmann et al., 2020), although some samples had wide confidence intervals. This may be explained by failing to provide a definition of bullying to participants (e.g., Borschmann et al., 2020). Larger effect sizes within the same sample of participants may be explained by having a lower threshold for frequency of bullying. For example, Le et al. (2019) used a threshold of "once or twice a month" whereas Le et al. (2017) used the cutoff point "a few times a month." Materials S4 and S7 provide further study-specific information.
3.9 | Influence of sex/gender on the association between bullying and SITB Sex and/or gender was often included in multivariable models as a control variable and many studies provided prevalence rates of bullying and/or SITB by sex/gender (see Materials S7 and S8).
Four studies looked at whether sex/gender acted as a moderator in the association between bullying and SITB by adding an interaction term into their models (Bannink et al., 2014;Copeland et al., 2013;Perret et al., 2020;Sigurdson et al., 2018). One reported nonstatistically significant interactions without specifying the effect size (Perret et al., 2020), one did not report on the interaction terms (Sigurdson et al., 2018), and the other did not specify the interaction terms but stratified significant interactions by gender (Copeland et al., 2013). The final study, looking at suicidal ideation, reported small interactions for gender × traditional bullying (aOR: 1.41, 95% CI: [0.83, 2.33], p = .20) and gender × cyberbullying (aOR: 1.39 [0.56, 3.45], p = .48) but did not stratify the results due to a nonstatistically significant interaction (Bannink et al., 2014). Additionally, one study looked at the direct and indirect associations between sex and self-harm via being bullied using path analysis (Lereya et al., 2013).
Four studies stratified all findings by sex/gender and did not present unstratified results (Kim et al., 2009;Klomek et al., 2009;Mars et al., 2020;Sigurdson et al., 2018), and three studies presented results that were stratified and unstratified by sex/ gender (Fisher et al., 2012;Le et al., 2017Le et al., , 2019. Five studies looked at associations with self-harm, four with suicidal ideation, and four with other suicidal behaviors.

| Associations between bullying and self-harm
The association with self-harm by sex/gender was explored in one study of young adults (Mars et al., 2020), two studies in pre-to early adolescence (Fisher et al., 2012;Lereya et al., 2013), and a final study which looked at both mid-adolescence and young adults (Sigurdson et al., 2018). In Mars et al. (2020), the association of cyberbullying with self-harm was stronger for young women (aOR: 2.42, 95% CI: [1.41, 4.15]) than young men (aOR: 1.59 [0.35, 7.26]), while in Sigurdson et al. (2018), traditional bullying had a stronger association with self-harm for young men (aOR: 3.86, 95% CI: [1.31, 11.41], p = .014) than women (aOR: 1.91, 95% CI: [1.01, 3.63], p = .047) although there was greater variability in scores for young men. The risk of self-harm aged 12 after being bullied in preadolescence was high for both boys and girls (Fisher et al., 2012), with the associations strongest for boys when bullying was reported by the mother (RR: 4.92, 95% CI: [2.33, 10.40]) and strongest for girls when reported by the children themselves (RR: 4.16,[1.93,8.95]). Finally, in a study using path analysis (Lereya et al., 2013), boys were significantly more likely to be bullied and girls more likely to self-harm, and the association between the sex of the child and self-harm via being bullied was stronger for boys (β = −.04, SE = 0.01, p = .001).

| Summary
The associations between bullying and/or peer victimization and the different components of SITBs have been explored in recent decades (Hong et al., 2015;John et al., 2018;Kim & Leventhal, 2008;Serafini et al., 2021). This review extends the literature by focusing on bullying victimization (i.e., characterized by repetition, power imbalance and intention to harm), separated by sub-types (i.e., traditional, cyber or aggregated measures of both) across the broad spectrum of SITBs (from NSSI to attempted or completed suicide) in longitudinal studies that includes children and adolescents as well as young adults. Additionally, this review looks at whether these associations differ by sex/gender.
Bullying was frequently associated with SITBs, with small to large associations, most often in mid-adolescence. This is generally unsurprising, due to profound developmental changes at this age and the wider influence of the social environment (Pfeifer & Allen, 2021). Traditional forms of bullying have also been reported as most present during early to midadolescence (Kowalski et al., 2014). Most studies looked at the association between traditional bullying and suicidal ideation and/or self-harm, or aggregated measures of self-harm and suicidality. Fewer studies looked at the impact of bullying on NSSI, suicide attempts and completed suicide, outcomes among young adults, or the long-term impact of cyberbullying. There were no noteworthy differences across countries, nor for studies with longer timespans, nor between smaller and larger studies. Findings were often mixed, with heterogeneity in study design and variables being explored. Some studies focused on traditional bullying or cyberbullying (or both combined), some explored the effects for bully-victims and victims separately, some chose different confounders (or none at all) and few stratified their findings by sex or gender. For this reason, it is difficult to present consistent patterns of findings that can be generalized across groups.
With the few studies that stratified by sex/gender, this review has found strongest associations between bullying and suicide attempts in older adolescent boys and young men (particularly bully-victims), and bullying and self-harm and suicidal ideation in girls and young women. Despite the heterogeneity of findings in this review, this study highlights the importance of investigating the experience of different types of bullying (e.g., traditional bullying or cyberbullying; overt vs. relational bullying) and its frequency/chronicity, on different types of victims (i.e., those who are only victims or also perpetrators of bullying), at different ages, with results stratified by sex/gender. Future studies which provide this level of detail may help to better tailor any anti-bullying prevention and intervention programs, rather than assuming victims are one homogenous group.
This review also extended previous reviews by looking at the spectrum of youth, from childhood into young adulthood. Although most studies looked at outcomes in childhood and adolescence (i.e., under 18), some studies explored and found negative outcomes for young adults who were victimized many years before. This included suicidal ideation in a twin study, and in young women, as well as self-harm and suicide attempts among young men. Indeed, it is thought that life events that take place during periods of transition such as early to mid-adolescence may have a longer-lasting effect (de Moor et al., 2019;Graber et al., 2018). For example, research on self-harm is often focused on teenagers, but research suggests older adults who self-harm often have a history of this behavior (Troya et al., 2019), highlighting the importance of continued follow up to better understand the long-term impact of victimization across the lifespan.
Overall, the studies were of good quality, and study quality was not related to outcomes. The highest rated studies considered multiple confounding factors and used well-defined and validated measures to assess exposures and outcomes. Importantly, these studies also provided participants with a definition of bullying, and used measures that captured the three elements of bullying (i.e., power imbalance, intention to harm, repetition). Studies had smaller effect sizes when they scored lowest on the quality assessment and/or used measures that lacked a definition/examples of bullying or failed to capture several of the core element. For example, being a single-item question in a large survey. Although less than half of studies controlled for baseline levels of the outcome (an issue for inferring causality), they regularly found small to large effect sizes, tentatively support directionality between bullying and SITBs. Future studies should account for this in the design or analysis stage.
Many studies failed to capture the component "power imbalance" within their measure of bullying, despite incorporating this within the definition of bullying at the start of their manuscript. This supports findings from a previous review on bullying measures (Vivolo-Kantor et al., 2014) and is important because "power imbalance" is one of the two elements that differentiates bullying from peer victimization. Only two studies failed to capture any of the components of bullying in their measure (Lung et al., 2020;O'Connor et al., 2009), also scoring lower quality assessment scores, and were given less weight in the review's overall conclusions.

| Associations by type of bullying (traditional vs. cyber)
In this review, 21 studies measured only traditional, face-to-face bullying (2 of which looked at sub-types of traditional bullying), 3 studies measured only cyberbullying, and 4 studies looked at both forms but presented separate analyses. Additionally, 7 studies looked at both forms in a combined measure. Due to the small numbers, it is difficult to draw confident conclusions about differences between the types although tentatively it may appear there were slightly weaker effects for cyberbullying, in studies that mostly measured outcomes in adolescence rather than young adulthood.
Although associations between cyberbullying and SITBs were found in mid-adolescence after controlling for sociodemographic factors and baseline depression, many of these effects reduced after adjusting for baseline suicidality and/or traditional forms of bullying. The two forms of bullying are often associated (Kowalski et al., 2019;Zych & Farrington, 2021), with previous cross-sectional studies finding some variance above and beyond traditional bullying, particularly suicidal ideation (Kowalski et al., 2014;van Geel et al., 2014). One explanation for our results is the younger age range of our studies due to the requirement of cyberbullying occurring before 18 years old; negative outcomes from cyberbullying may appear later than traditional bullying (Bannink et al., 2014), with prevalence of cyberbullying thought to peak in midadolescence but may reappear in young adulthood. For this reason, future studies may wish to explore cyberbullying that starts in young adulthood, supporting previous recommendations (Kowalski et al., 2019).
Additionally, the two studies that looked at sub-types of traditional bullying had some striking findings. Victims of chronic physical bullying (i.e., persists over time) may have over seven times more risk of suicide attempts in midadolescence compared to non-victims (Brunstein Klomek et al., 2019). Worryingly, at the age of 11, victims of overt bullying (e.g., physical and verbal) may be 2.5 times more likely to engage in suicidal or self-injurious behaviors . There is clearly a space for future longitudinal research studies to consider looking at bullying sub-types, to better understand patterns of behavior among different groups of young people.

| Associations by sex/gender
In the present review, less than 20% of studies looked at the moderating effect of sex/gender in the association between bullying and SITBs. Similar to previous reviews (Heerde & Hemphill, 2019;Holt et al., 2015;John et al., 2018), the role of sex/ gender in these associations was not fully clear. However, closer inspection revealed some interesting patterns worth further exploration. First, although rates of bullying were not massively different between boys and girls, boys were more likely to be bully-victims, a group at higher risk of negative mental health outcomes compared to pure victims or bullies (Hunter et al., 2007;Menin et al., 2021). The small number of papers that stratified by sex/gender and victim status results found some alarmingly high rates of suicidal behaviors in boys and young men, particularly bully-victims. This group may be at most risk due to experiencing internalizing and externalizing behaviors, warranting further study (Kelly et al., 2015). For example, by incorporating measures to assess co-occurrence of bullying victim/perpetration status (Jadambaa et al., 2019), and looking at the influence of sex/gender. Second, only two studies looked at outcomes from specific sub-types of traditional forms of face-to-face bullying, which helps to better understand any nuances between bullying and SITBs, across genders. Research suggests that girls are more likely to be victims of relational bullying (e.g., gossiping or exclusion), and boys of physical bullying (Crick & Bigbee, 1998). Relational bullying, including gossiping and exclusion, has shown a stronger link with suicidal ideation, while physical bullying is more associated with suicidal acts (Van der Wal et al., 2003;Zhao & Yao, 2022 ). Repeated exposure to physical bullying may increase tolerance to pain; in turn, this may provide the acquired capability to transition from suicidal ideation to acts, according to the interpersonal theory of suicide (Brunstein Klomek et al., 2019;Joiner, 2007). This may explain our findings that traditional bullying in boys may have a stronger association with suicidal behaviors in late adolescence and early adulthood, whereas it is more strongly associated with ideation in girls and young women. However, these conclusions cannot be confirmed in the present review, as few studies looked at the association between bullying sub-types and SITBs. It is clear that future studies would benefit from stratifying by sex/gender and looking at sub-types of bullying to better understand the trajectories over time, enhancing bullying prevention strategies and more tailored support.

| Methodological issues
Although studies were all longitudinal, community-based studies, there was great heterogeneity in measurements used, scope and definitions of the key concepts, whether the outcome was controlled at baseline, and statistical methods used to interpret the results (specifically, reliance on p values).
A range of measurements were used to ascertain bullying and the SITBs, with many being unvalidated measures, particularly for the outcome(s). Indeed, several studies aggregated several outcomes (e.g., self-harm and suicidal ideation, or simply said "suicidality"), resulting in 10 unique types of outcome overall. Studies rated higher in quality regularly provided a clear definition, with examples, of bullying and/or the outcome to study participants, with authors signposting the reader to an example text. This is important for ensuring the correct concept is being measured.
Moreover, with cyberbullying being another potentially more subtle form of bullying, well-defined, validated self-report measures are clearly necessary to gauge an accurate picture of the extent of the problem . Providing a clear working definition tailored to the target audience and/or presenting participants with a list of experiences is one such step, alongside focus groups with young people themselves to prevent any disconnect with researchers' definitions (Furlong et al., 2010;Menin et al., 2021). First, this will help better understand who is most at risk. Second, traditional bullying prevention programs can be adapted to better address the nuances of cyberbullying (Olweus et al., 2019).
This review found that who reports on the bullying is an important consideration. In the two studies (ALSPAC and E-Risk) that presented findings according to whether bullying aged 7-10 years was reported by the child, mother and/or teacher, associations were strongest for boys when the mother and teacher reports were included. It has been suggested that indirect forms of bullying may be more subtle and missed by adults, possibly underestimating bullying in these children, who are more likely to be girls (Husky et al., 2022). Despite the methodological limitations of self-report data, these measures may therefore better capture power imbalance and intention to harm that other informants may miss (Furlong et al., 2010;Jadambaa et al., 2019).
Longitudinal research aims to enhance the ability to draw conclusions about causality, for example by ensuring the exposure precedes the outcome. Observational study designs are most appropriate for harmful exposures, due to the unethical implications of manipulating exposure to bullying. Unfortunately, only 11 out of 35 studies controlled for baseline levels of the outcome under investigation, providing less certainty that bullying preceded self-harm or suicidal behavior and there is potential for reverse causality (i.e., a person displaying self-harming or suicidal behaviors may become a target of being bullied).
Finally, many studies made conclusions based on p values rather than interpreting effect sizes, and confidence intervals were often missing. This inhibited the opportunity, at times, to draw meaningful inferences to the wider population with any degree of certainty.

| Future research
This review found several gaps in the literature that future studies should address.
First, there are very few prospective, longitudinal studies that look at cyberbullying and SITBs. Prevalence rates of cyberbullying often appear consistently lower than traditional bullying (Modecki et al., 2014), and the power to detect statistically significant differences are reduced when looking at an uncommon exposure such as cyberbullying and an uncommon outcome such as suicidality (Bannink et al., 2014). For this reason, future studies should not draw conclusions based solely on statistical significance testing. Rather, strength of effects should be explored, and qualitative studies should be conducted that have potential to generate greater depth of understanding about the experience of being a victim of cyberbullying. Moreover, previous studies suggest suicides linked to cyberbullying are often associated with other proximal risk factors (Hinduja & Patchin, 2010). Research should look at better understanding the different environmental factors which may work together and exacerbate feelings of perceived burdensomeness and thwarted belonginess-elements of suicidal ideation-and ways to reduce this risk (Joiner, 2007).
Second, future studies should collect data on socioeconomic status and ethnicity, as this was captured in less than a quarter of studies. It is, therefore, unclear if the association between bullying and SITBs could be generalized across sociodemographic groups. There may be specific nuances faced within or between people of different ethnicities (Kuldas et al., 2021), with victimization greater among poorer students (Hosozawa et al., 2021). Indeed, current in-school bullying prevention programs may be less effective in minority ethnic groups, therefore highlighting the importance of understanding the nature of bullying across diverse populations . Moreover, studies would benefit from recruiting samples in diverse, urban areas, in preparation for the future direction of global development (i.e., greater urbanization). This is particularly important as 55% of the world's population live in urban areas, a proportion expected to increase to 68% by 2050 (Valencia-Agudo et al., 2018).
Finally, given the potential nuances in experiences of bullying by boys and girls, as highlighted above, future studies would benefit from stratifying results by sex/gender. Sample sizes, if small, may result in false negative findings, once again highlighting the limitations of relying on p values when interpreting gender × bullying interaction terms, for example. The few studies which stratified results in this review had some interesting findings which may have been missed if adolescents are treated as a homogonous group with respect to sex/gender.

| Strengths and limitations
A key strength of this review is the focus on prospective studies, which can better explore the direction of effect between bullying and SITBs.
There is ongoing discussion about whether cyberbullying and traditional face-to-face bullying are distinct or overlapping constructs (Olweus, 2010;Walker et al., 2013), and this review adds to the literature by presenting results grouped based on the original conceptualizations of authors and the respective measures, as recommended in a previous review (Camerini et al., 2020). Few prospective studies have looked at cyberbullying and SITBs, something which is greatly warranted given the rise of smartphone ownership among young people and the need to disentangle causal relationships from findings in crosssectional studies (John et al., 2018).
As with any review, there are limitations. There is debate about whether cyberbullying should be better conceptualized as cyberaggression, and the current review took a restrictive approach that drew on the traditional definition of bullying . Moreover, the age limit was restricted to under 18s for the exposure, despite cyberbullying potentially continuing into young adulthood. Although few studies were excluded on this basis, some relevant studies may have been missed. The quality assessment scores should be interpreted with some caution; across similar reviews, many scales have been heavily adapted (Epstein et al., 2020;Latham et al., 2021;Moore et al., 2017), suggesting they may not be fit-forpurpose in studies looking at bullying and suicidality.
Finally, the decision to focus on bullying victimization was based on arguments in the literature (Furlong et al., 2010), and studies looking at the broader construct of peer victimization were excluded. The decision to restrict exposure types in this way may have excluded some studies looking at peer victimization with findings of some relevance. Moreover, there remains great heterogeneity of measurement in the bullying literature (Vivolo-Kantor et al., 2014), with many studies failing to capture the three core components that have general consensus with researchers: repetition, intention to harm, power imbalance (Farrington, 1993;Olweus, 1994Olweus, , 2010Smith & Brain, 2000;Younan, 2019). Although the present review only included papers with measures explicitly stated as "bullying," there is the possibility that some studies may in fact be capturing peer victimization. Greater precision of terms being measured is required, building on existing good practice and drawing on arguments in the literature (Furlong et al., 2010;Quinlan et al., 2020).

| CONCLUSION
The present review has found prospective associations of varying effect sizes between bullying and self-harm and suicidal thoughts and behaviors. The field is marked with great heterogeneity in terms of methodologies, making it difficult to draw concrete conclusions. Future research should aim to capture the nuances of bullying (e.g., by sub-type and frequency) and its impact across the spectrum of SITBs, at different ages, among bullies, victims and bully-victims. Importantly, results should be stratified by sex/gender, to better understand the complex dynamics that could be targeted in anti-bullying interventions, and tailor support for victims of bullying.